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— Abstract 

The purpose of this paper is to derive optimal rules for sequential mastery tests. In a 
sequential mastery test, the decision is to classify a subject as a master, a nonmaster, or to continue 
sampling and administering another random item. The framework of minimax sequential decision 
theory (minimum information approach) is used; that is, optimal rules are obtained by minimizing 
the maximum expected losses associated with all possible decision rules at each stage of sampling. 
The main advantage of this approach is that costs of sampling can be explicitly taken into account. 
The binomial model is assumed for the probability of a correct response given the true level of 
functioning, whereas threshold loss is adopted for the loss function involved. Monotonicity 
conditions are derived, that is, conditions sufficient for optimal rules to be in the form of sequential 
cutting scores. The paper concludes with a simulation study, in which the minimax sequential 
strategy is compared with other procedures that exist for similar classification decision problems in 
the literature. 



Key words: sequential mastery testing, minimax sequential rules, monotonicity conditions, 

least favorable prior, binomial distribution, threshold loss, most efficient strategy. 
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Introduction 

Well-known examples of fixed-length mastery tests include pass/fail decisions in education, 
certification, and successfulness of therapies. The fixed-length mastery problem has been studied 
extensively in the literature within the framework of (empirical) Bayesian decision theory (e.g., De 
Gruijter & Hambleton, 1984; van der Linden, 1990). In addition, optimal rules for the fixed-length 
mastery problem have also been derived within the framework of the minimax strategy (e.g., 
Huynh, 1980; Veldhuijzen, 1982). 

In both approaches, the following two basic elements are distinguished: A psychometric 
model relating the probability of a correct response to student's (unknown) true level of 
functioning, and a loss structure evaluating the total costs and benefits for each possible 
combination of decision outcome and true level of functioning. Within the framework of Bayesian 
decision theory (e.g., DeGroot, 1970; Lehmann, 1959), optimal rules (i.e., Bayes rules) are obtained 
by minimizing the posterior expected losses associated with all possible decision rules. Decision 
rules are hereby prescriptions specifying for each possible observed response pattern what action 
has to be taken. The Bayes principle assumes that prior knowledge about student's true level of 
functioning is available and can be characterized by a probability distribution called the prior. 

Using minimax decision theory (e.g., DeGroot, 1970; Lehmann, 1959), optimal rules (i.e., 
minimax rules) are obtained by minimizing the maximum expected losses associated with all 
possible decision rules. In fact, the minimax principle assumes that it is best to prepare for the 
worst and to establish the maximum expected loss for each possible decision rule (e.g., van der 
Linden, 1981). In other words, the minimax decision rule is a bit conservative and pessimistic 
(Coombs, Dawes, & Tversky, 1970). 

The test at the end of the treatment does not necessarily have to be a fixed-length mastery 
test but might also be a variable-length mastery test. In this case, in addition to the actions declaring 
mastery or nonmastery, also the action of continuing sampling and administering another item is 
available. Variable-length mastery tests are designed with the goal of maximizing the probability of 
making correct classification decisions (i.e., mastery and nonmastery) while at the same time 
minimizing test length (Lewis & Sheehan, 1990). For instance, Ferguson (1969) showed that 
average test lengths could be reduced by half without sacrificing classification accuracy. 

Generally, two main types of variable-length mastery tests can be distinguished. First, both 
the item selection and stopping rule (i.e., the termination criterion) are adaptive. Student's ability 
measured on a latent continuum is estimated after each response, and the next item is selected such 
that its difficulty matches student's last ability estimate. 
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Hence, this type of variable-length mastery testing assumes that items differ in difficulty, 
and is denoted by Kingsbury and Weiss (1983) as adaptive mastery testing (AMT). 

In the second type of variable-length mastery testing, the stopping rule only is adaptive but 
the item to be administered next is selected random. In the following, this type of variable-length 
mastery testing will be denoted as sequential mastery testing (SMT). The purpose of this paper is to 
derive optimal rules for SMT using the framework of minimax sequential decision theory (e.g., 
DeGroot, 1970; Lehmann, 1959). The main advantage of this approach is that costs of sampling 
(i.e., administering another random item) can be explicitly taken into account. 



Review of Existing Procedures to Variable-Length Mastery Testing 

In this section, earlier solutions to both the adaptive and sequential mastery problem will be briefly 
reviewed. First, earlier solutions to AMT will be considered. Next, it will be indicated how SMT 
has been dealt with in the literature. 

Earlier solutions to adaptive mastery testing 

In adaptive mastery testing, two item response theory (IRT)-based strategies have been primarily 
used for selecting the item to be administered next. First, Kingsbury and Weiss (1983) proposed the 
item to be administered next is the one that maximizes the amount of (Fisher's) information at 
student's last ability estimate. 

In the second IRT-based approach, the Bayesian item selection strategy, the item that 
minimizes the posterior variance of student's last ability estimate is administered next. In this 
approach, a prior distribution about student’s ability must be specified. If a normal distribution is 
assumed as a prior, an estimate of the posterior distribution of student’s last ability, given observed 
test score, may be obtained via a procedure called restricted Bayesian updating (Owen, 1975). Also, 
posterior variance may be obtained via Owen’s Bayesian scoring algorithm. Nowadays, numerical 
procedures for computing posterior ability and variance do also exist. 

Both IRT-based item selection procedures make use of confidence intervals of student’s 
latent ability for deciding on mastery, nonmastery, or to continue sampling. Decisions are made by 
determining whether or not a prespecified cut-off point on the latent IRT-metric, separating masters 
from nonmasters, falls outside the limits of this confidence interval. 
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As an aside, as pointed out by Chang and Stout (1993), it may be noted that the posterior 
variance converges to the reciprocal of the test information when the number of items goes to 
infinity. Therefore, the two methods of IRT-based item selection strategies should yield similar 
results when the item number is large. 

Existing procedures to the sequential mastery problem 

One of the earliest approaches to sequential mastery testing dates back to Ferguson (1969) using 
Wald's well-known sequential probability ratio test (SPRT), originally developed as a statistical 
quality control test for light bulbs in a manufacturing setting. In Ferguson's approach, the 
probability of a correct response given the true level of functioning (i.e., the psychometric model) is 
modeled as a binomial distribution. The choice of this psychometric model assumes that, given the 
true level of functioning, each item has the same probability of being correctly answered, or that 
items are sampled at random. 

As indicated by Ferguson (1969), three elements must be specified in advance in applying 
the SPRT-framework to sequential mastery testing. First, two values po and p\ on the proportion- 
correct metric must be specified representing points that correspond to lower and upper limits of 
true level of functioning. at which a mastery and nonmastery decision will be made, respectively. 
Also, these two values mark the boundaries of the small region (i.e., indifference region) where we 
never can be sure to take the right classification decision, and, thus, in which sampling will 
continue. Second, two levels of error acceptance a and /? must be specified, reflecting the relative 
costs of the false positive (i.e., Type I) and false negative (i.e., Type II) error types. Intervals can be 
derived as functions of these two error rates for which mastery and nonmastery is declared, 
respectively, and for which sampling is continued (Wald, 1947). Third, a maximum test length 
must be specified in order to classify within a reasonable period of time those students for whom 
the decision of declaring mastery or nonmastery is not as clear-cut. 

Reckase (1983) has proposed an alternative approach to sequential mastery testing within an 
SPRT-framework. Unlike Ferguson (1969), Reckase (1983) did not assume that items have equal 
characteristics but allowed them to vary in difficulty and discrimination by using an IRT-model 
instead of a binomial distribution. Modeling response behavior by an IRT model, as in Reckase's 
(1983) model, Spray and Reckase (1996) compared Wald's SPRT procedure also with a maximum 
information item selection (MRS) procedure (Kingsbury and Weiss, 1983). The results showed that 
under the conditions studied, the SPRT procedure required fewer test items than the MRS 
procedure to achieve the same level of classification accuracy. 
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This finding is consistent with Wald’s (1947) conclusion that the SPRT was the uniformly 
most powerful test of simple hypotheses. 

Recently, Lewis and Sheehan (1990), Sheehan and Lewis (1992), and Smith and Lewis 
(1995) have applied Bayesian sequential decision theory (e.g., DeGroot, 1970; Lehmann, 1959) to 
SMT. In addition to a psychometric model and a loss function, cost of sampling (i.e., cost of 
administering one additional item) must be explicitly specified in this approach. Doing so, posterior 
expected losses associated with the nonmastery and mastery decisions can now be calculated at 
each stage of sampling. As far as the posterior expected loss associated with to continue sampling 
concerns, this quantity is determined by averaging the posterior expected losses associated with 
each of the possible future decision outcomes relative to the probability of observing those 
outcomes (i.e., the posterior predictive distributions). 

Optimal rules (i.e., Bayesian sequential rules) are now obtained by choosing the action that 
minimizes posterior expected loss at each stage of sampling using techniques of dynamic 
programming (i.e., backward induction). This technique starts by considering the final stage of 
sampling and then works backward to the first stage of sampling. Backward induction makes use of 
the principle that upon breaking into an optimal procedure at any stage, the remaining portion of the 
procedure is optimal when considered in its own right. Doing so, as pointed out by Lewis and 
Sheehan (1990), the action chosen at each stage of sampling is optimal with respect to the entire 
sequential mastery testing procedure. 

Lewis and Sheehan (1990) and Sheehan and Lewis (1992), as in Reckase's approach, 
modeled response behavior in the form of a 3-parameter logistic (PL) model from IRT. The number 
of possible outcomes of future random item administrations, needed in computing the posterior 
expected loss associated with the continuing sampling option, can become very quick quite large. 
Lewis and Sheehan (1990), therefore, made the simplification that the number-correct score in the 
3-PL model is sufficient for calculating the posterior predictive distributions rather than the entire 
pattern of item responses. 

As an aside, it may be noted that Lewis and Sheehan (1990), Sheehan and Lewis (1992), 
and Smith and Lewis (1995) used testlets (i.e., blocks of items) rather than single items. 

Vos (1999) also applied the framework of Bayesian sequential decision theory to SMT. As 
in Ferguson's (1969) approach, however, the binomial distribution instead of an ERT-model is 
considered for modeling response behavior. 
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It is shown that for the binomial distribution, in combination with the assumption that prior 
knowledge about student's true level of functioning can be represented by a beta prior (i.e., its 
natural conjugate), the number-correct score is sufficient to calculate the posterior expected losses 
at future stages of item administrations. Unlike the Lewis and Sheehan (1990) model, therefore, no 
simplifications are necessary to deal with the combinatorial problem of the large number of 
possible decision outcomes of future item administrations. 



Minimax Sequential Decision Theory Applied to SMT 

In this section, the framework of minimax sequential decision theory (e.g., DeGroot, 1970; 
Lehmann, 1959) will be treated in more detail. Also, a rationale is provided for why this approach 
should be preferred above the Bayesian sequential principle. 

Framework of minimax sequential decision theory 

In minimax sequential decision theory, optimal rules (i.e., minimax sequential rules) are found by 
minimizing the maximum expected losses associated with all possible decision rules at each stage 
of sampling. Analogous to Bayesian sequential decision theory, cost per observation is also 
explicitly been taken into account in this approach. Hence, the maximum expected losses associated 
with the mastery and nonmastery decisions can be calculated at each stage of sampling. The 
maximum expected loss associated with the continuing sampling option is computed by averaging 
the maximum expected losses associated with each of the possible future decision outcomes 
relative to the posterior predictive probability of observing those outcomes. 

Unlike Bayesian sequential decision theory, specification of a prior is not needed in 
applying the minimax sequential principle. A minimax sequential rule, however, can be conceived 
of as a rule that is based on minimization of posterior expected loss as well (i.e., as a Bayesian 

sequential rule), but under the restriction that the prior is the least favorable element of the class of 

< 

priors (e.g., Ferguson, 1967). 
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Rationale for preferrine the minimax principle above the Bayesian principle 

The question can be raised why minimax sequential decision theory should be preferred above the 
Bayesian sequential principle. As pointed out by Huynh (1980), the minimax (sequential) principle 
is very attractive when the only information is student's observed number-correct score; that is, no 
group data of 'comparable' students who will take the same test or prior information about the 
individual student is available. The minimax strategy, therefore, is sometimes also denoted as a 
minimum information approach (e.g., Veldhuijzen, 1982). 

If group data of 'comparable' students or prior information about the individual student is 
available, however, it is better to use this information. Hence, in this situation it is better to use 
Bayesian instead of minimax sequential decision theory. Even if information in the form of group 
data of 'comparable' students or prior information about the individual student is available, it is 
sometimes too difficult a job to accomplish to express this information into a prior distribution 
(Veldhuijzen, 1982). In these circumstances, the minimax sequential procedure may also be more 
appropriate. 



Notation 

Within the framework of both minimax and Bayesian sequential decision theory, optimal rules can 
be obtained without specifying a maximum test length. In the following, however, a sequential 
mastery test is supposed to have a maximum test length n (n > 1). As pointed out by Ferguson 
(1969), a maximum test length is needed in order to classify within a reasonable period of time 
those students for whom the decision of declaring mastery or nonmastery is not as clear-cut. 

Let the observed item response at each stage of sampling k (1 < k < n) for a randomly 
sampled student be denoted by a discrete random variable X*, with realization x*. The observed 
response variables Xi ,...,X* are assumed to be independent and identically distributed for each value 
of k, and take the values 0 and 1 for respectively incorrect and correct responses to the k-th item. 
Furthermore, let the observed number-correct score be denoted by a discrete random variable S* = 
X] +...+ X*, with realization Sk = x i +...+ ** (0 < < k). 




L 
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Student's true level of functioning is unknown due to measurement and sampling error. All 
that is known is his/her observed number-correct score s*. In other words, the mastery test is not a 
perfect indicator of student's true performance. Therefore, let student's true level of functioning be 
denoted by a continuous random variable T on the latent proportion-correct metric, with realization 
te [ 0 , 1 ]. 

Finally, a criterion level t c (0 < t c < 1) on the true level of functioning scale T can be 
identified. A student is considered a true nonmaster and true master if his/her true level of 
functioning t is smaller or larger than t c , respectively. The criterion level must be specified in 
advance by the decision-maker. Several methods for setting standards on the observed score level 
have been proposed in the literature (e.g., Angoff, 1971; Nedelsky, 1954). However, these standard 
setting methods do not apply to the true level of functioning T. The criterion level t c on the true 
level of functioning T, therefore, must be set by content experts by indicating the minimal 
percentage of the total domain of items a student must be able to answer correctly in order to be 
declared mastery status. 

Assuming Xi = x\ ,...JCk = *k has been observed, the two basic elements of minimax 
sequential decision making discussed earlier can now be formulated as follows: A psychometric 
model f[s k 1 1) relating observed number-correct score s k to student's true level of functioning t at 
each stage of sampling k, and a loss function describing the loss /(a, (xi,. ..,**), 0 incurred when 
action a,(xi,...,x*) is taken for the student whose true level of functioning is t. The actions 
nonmastery, mastery, and to continue sampling will be denoted as ao(x i,...,x*), <3i(*i,. and 
<22 (*i,...,XjO, respectively. 



Generally speaking, as noted before, a loss function evaluates the total costs and benefits of 
all possible decision outcomes for a student whose true level of functioning is t. These costs may 
concern all relevant psychological, social, and economic consequences which the decision brings 
along. The Bayesian as well as minimax approach allows the decision-maker to incorporate into the 
decision process the costs of misclassifications (i.e., students for whom the wrong decision is 
made). As in Hambleton and Novick (1973), here the well-known threshold loss function is 
adopted as the loss structure involved. 



Threshold Loss 
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The- choice of this loss function implies that the "seriousness" of all possible 
consequencesof the decisions can be summarized by possibly different constants, one for each of 
the possible classification outcomes. 

For the sequential mastery problem, a threshold loss function can be formulated as a natural 
extension of the one for the fixed-length mastery problem at each stage of sampling k as follows 
(see also Lewis & Sheehan, 1990): 

Insert Table 1 about here 

The value e represents the costs of administering one random item. For the sake of 
simplicity, following Lewis and Sheehan (1990), these costs are assumed to be equal for each 
classification outcome as well as for each sampling occasion. Of course, these two assumptions can 
be relaxed in specific sequential mastery testing applications. Applying an admissible positive 
linear transformation (e.g., Luce & Raiffa, 1957), and assuming the losses Zoo and l\\ associated 
with the correct classification outcomes are equal and take the smallest values, the threshold loss 
function in Table 1 was rescaled in such a way that Zoo and l\\ were equal to zero. Hence, the losses 
Zoi and Zio must take positive values. 

Note that no losses need to be specified in Table 1 for the continuing sampling action 
(a 2 (*i,. ..,**))■ This is because the maximum expected loss associated with the continuing sampling 
option is computed at each stage of sampling as a weighted average of the maximum expected 
losses associated with the classification decisions (i.e., mastery/nonmastery) of future item 
administrations with weights equal to the probabilities of observing those outcomes. 

The ratio Zio/Zoi is denoted as the loss ratio R, and refers to the relative losses for declaring 
mastery to a student whose true level of functioning is below t c (i.e., false positive) and declaring 
nonmastery to a student whose true level of functioning exceeds t c (i.e., false negative). 

The loss parameters ly (Z = 1,2; i * j) associated with the incorrect decisions have to be 
empirically assessed, for which several methods have been proposed in the literature. Most texts on 
decision theory, however, propose lottery methods (e.g., Luce & Raiffa, 1957) for assessing loss 
functions empirically. In general, the consequences of each pair of actions and true level of 
functioning are scaled in these methods by looking at the most and least preferred outcomes. But, in 
principle, any psychological scaling method can be used. 
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Psychometric Model 

As earlier remarked, here the well-known binomial model will be adopted for specifying the 
statistical relation between the observed number-correct score s k and student's true level of 
functioning t. Its distribution /(s* 1 1) at stage k of sampling, given student’s true level of functioning 
t, can be written as follows: 



f(s k 10 = 



fk ' 

K S k J 



t h (l-tf~ h . 



( 1 ) 



If each response is independent of the other, and if the examinee's probability of a correct 
answer remains constant, the distribution function of s k , given student’s true level of functioning t, 
is given by Equation 1 (Wilcox, 1981). The binomial model assumes that the test given to each 
student is a random sample of items drawn from a large (real or imaginary) item pool (Wilcox, 
1981). Therefore, for each student a new random sample of items must be drawn in practical 
applications of the sequential mastery problem. 



Sufficient Conditions for Minimax Sequential Rules to be Monotone 

Linking up with common practice in mastery testing, minimax sequential rules in this paper are 
assumed to have monotone forms. Decision rules in practical situations in education and 
psychology usually take the form of selecting one or more cutting scores on the test. Decision rules 
of this form constitute a special subclass known as monotone rules (Ferguson. 1967, sect. 6.1). In 
other words, a decision rule is monotone if cutting scores are used to partition the test scores into 
intervals for which different actions are taken. As a result, monotone sequential rules can be 
defined on the number-correct score metric in the form of sequential cutting scores. The restriction 
to monotone rules, however, is correct only if it can be proven that for any nonmonotone rule for 
the problem at hand there is a monotone rule with at least the same value on the criterion of 
optimality used (Ferguson, 1967, p.55). Using a minimax sequential rule, as noted before, the 
minimum of the maximum expected losses associated with all possible decision rules is taken as 
the criterion of optimality at each stage of sampling. 
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As rioted before, the maximum expected loss for continuing sampling is hereby determined 
by averaging the maximum expected losses associated with each of the possible future decision 
outcomes relative to the probability of observing those outcomes. Therefore, it follows immediately 
that the conditions sufficient for setting cutting scores for the fixed-length mastery problem are also 
sufficient for the sequential mastery problem at each stage of sampling. 

Generally, conditions sufficient for setting cutting scores for the fixed-length mastery 
problem are given in Ferguson (1967). First, /(s* | /) must have a monotone likelihood ratio (MLR); 
that is, it is required that for any t\ > t 2 , the likelihood ratio f(s k | /i) / /(.s'* I * 2 ) is a nondecreasing 
function of s k . MLR implies that the higher the observed number-correct score, the more likely it 
will be that the true level of functioning is high too. Second, the condition of monotonic loss must 
hold; that is, there must be an ordering of the actions such that for each pair of adjacent actions the 
loss functions possess at most one point of intersection. 

In our example the binomial density function is chosen as the psychometric model f(s k 1 1). 
Since the binomial model belongs to the monotone likelihood ratio family (Ferguson, 1967, Chap. 
5), it then follows that the condition of MLR is satisfied. Furthermore, by choosing loo = h 1 = 0 and 
assuming positive values for / 0 i and ho, it follows that for each pair of adjacent actions the loss 
functions don’t possess a point of intersection. Hence, it follows immediately that the condition of 
monotonic loss is also satisfied at each stage of sampling k. 



Optimizing Rules for the Sequential Mastery Problem 

In this section, it will be shown how optimal rules for SMT can be derived using the framework of 
minimax sequential decision theory. Doing so, given an observed item response vector (xi,...,x k ), 
first the minimax principle will be applied to the fixed-length mastery problem by determining 
which of the maximum expected losses associated with the two classification actions ao(x\,...,x k ) or 
a i(xi ,...,**) is the smallest. Next, applying the minimax principle again, optimal rules for the 
sequential mastery problem are derived at each stage of sampling k by comparing this quantity with 
the maximum expected loss associated with action 02 (^ 1 , ■■•>**) (i.e., continuing sampling). 





I 
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Applying the minimax principle to the fixed-length mastery problem 

Given Xi = x\ ,...,Xk = x k , as noted before, the minimax decision rule for the fixed-length mastery 
problem can be found by minimizing the maximum expected losses associated with the two 
classification actions a 0 (*i »■■■»**) and a\{x\,...jc k ). It is assumed that there exists a cutting score on S k , 
say s c (k ) (0 < s c (k) < k), such that mastery is declared when s k > s c {k) and that nonmastery is 
declared otherwise. Let y = 0,1,...,/: represent all possible values the number-correct score s k can 
take after having observed k item responses, assuming the conditions of monotonicity are satisfied, 
it then can easily be verified from Table 1 and Equation 1 that mastery (a\(x\,...,x k )) is declared 
when the maximum loss associated with the mastery decision is smaller than the maximum loss 
associated with the nonmastery decision, or, equivalently, when number-correct score s k is such that 



sup (lio+ke) 2 
t<t c y=s k 



V 

<yj 



r^(l-r)^ sup ( ke ) 



s k -1 

I 



t>t c y=0 



'k 



t y<\- t ) k -y < 



sup {ke) 2 
t<t c y=s k 



fk y 


t y (l-t) k sup 


s k— 1 

( loi+ke ) 2 


f k' 


1 

1 

V — " 
*«» 


(2) 




t>t c 


y=0 









and that nonmastery (ao(*i,- •■>**)) is declared otherwise. Since the cumulative binomial distribution 
function is decreasing in t, it follows that the inequality in (2) can be written as: 



, k 

) s 

y =s k 



f k' 


tc{'-*c) k ~ y + (ke)\ 1 


fk} 


<y> 


v=0 


<y> 



{ke) £ y + (loi+ke)\ l ( k )tg(X-t c ) k -y. 

y =s k ^y J y = o \y J 



( 3 ) 
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Rearranging terms, it follows that mastery is declared when number-correct score s* is such that: 



k 

z 

y= s k 




\yj 



(1 -t c ) k ~ y < l/(l +R), 



(4) 



where R denotes the loss ratio (i.e., R = /jo//oi)- If the inequality in (4) is not satisfied, nonmastery is 
declared. 

Derivation of minimax sequential rules 

Let dk(x\,...jCk) denote the action <2o(*i >•••,**) or a\{x\,...,Xk) yielding the minimum of the maximum 
expected losses associated with these two classification actions, and let the maximum expected loss 
associated with this minimum be denoted as V*(jci T hese notations can also be generalized to 
the situation that no observations have been taken yet; that is, <io(*o) denotes the action ao(xo) or 
a i(xo) which yields the smallest of the maximum expected losses associated with these two actions, 
and Vo(xo) denotes the smallest maximum expected loss associated with do(xo). 

Minimax sequential rules can now be found by using the following backward induction 
computational scheme: First, the minimax sequential rule at the final stage of sampling n is 
computed. Since the continuing sampling option is not available at this stage of sampling, it follows 
immediately that the minimax sequential rule is given by d n (x i,...,x„); its associated maximum 
expected loss is given by V„(jci, ...,*„). 

Subsequently, the minimax sequential rule at the next to last stage of sampling («-l) is 
computed by comparing V r „_i(xj,...,jc„.i) with the maximum expected loss associated with action 
a 2 (x i,...,x„.i) (i.e., continuing sampling). As noted before, the maximum expected loss associated 
with taking one more observation, given a response pattern (xi,...,x„.i), is computed by averaging 
the maximum expected losses associated with each of the possible future decision outcomes at the 
final stage n relative to the probability of observing those outcomes (i.e., backward induction). 

Let P(X n = x n | xi,...,x„.i) denote the distribution of X,„ given the observed item response 
vector (xi,...^„.|), then, the maximum expected loss associated with taking one more observation 
after (n-1) observations have been taken, E[V n (x\,... y x H .u X n ) | xi,...,x„.i], is computed as follows: 
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x n = 1 

EWn (-*1 >■■■> x n-\ > Xfi ) | = 2] ^n^ x li---’ x n)^ > (^n = x n | x\,...,x n — i), (5) 

x n= 0 

Generally, P(X k = x k | x\,...,x k .\) is called the posterior predictive distribution of X k at stage (£-1) of 
sampling. It will be indicated later on how this conditional distribution can be computed. 

Given a response pattern (x\,...,x„.i), the minimax sequential rule at stage (n-1) of sampling 
is now given by: Take one more observation if E[V n (x \,...,x n .\,X n ) \ x\,...,x n .\] is smaller than 
V n -\{.x\,...,x n .\), and take action d n .\(x\,...,x n .\) otherwise. If E[V n (x\,...jc„.\, X n ) | xi,...,x n -i] and 
V n -i(xi,...^c„.i) are equal to each other, it does not matter whether or not the decision-maker takes 
one more observation. 

To compute the maximum expected loss associated with the continuing sampling option, it 
is convenient to introduce the risk at each stage of sampling k, which will be denoted as R k (x[,...,x k ). 
Let the risk at stage n of sampling be defined as V„(xi,...,x„). Generally, given a response pattern 
(*i„. the risk at stage (k- 1) is then computed inductively as a function of the risk at stage k as 
follows: 



/?m(xi,...,x*.i) = min{ V*.i(xi,...,x*_i), £[/?*(xi,..,**-i, X k ) | xi,...,x*.i]}. (6) 

The maximum expected loss associated with taking one more observation after (n- 2) 
observations, E\R H .\(x\,...jc n . 2 , X n .i) \ xi,...,x n . 2 ], can then be computed as the expected risk at stage 
(n-1) as follows: 

£[tfn-l(*l> - >*n-2>*n-l) | = 

x n- 1=1 

E /? n _l(xi,...,x n _i)P(X n _i =x n _i | xi,...,x n _ 2 ). (7) 

x n- 1=0 

Given (xi,...,x„. 2 ), the minimax sequential rule at stage (n-2) of sampling is now given by: 
Take one more observation if £[/?„.,(*, ,...,x„. 2 , X„.,) | x,,...,x„. 2 ] is smaller than V„- 2 (x,,...,x„. 2 ); 
otherwise, action <7 ,i- 2 (xi,...,x„. 2 ) is taken. In the case of equality between V n . 2 (xi,...,x„. 2 ) and 
£[/? n -i(xi,...,x„. 2 , X n . | ) | X|,...,x„. 2 ], it does not matter again whether or not the decision-maker takes 
one more observation. 
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Following the same computational backward scheme as in determining the minimax 
sequential rules at stages (n-1) and (n-2), the minimax sequential rules at stages (n-3),...,l,0 are 
computed. The minimax sequential rule at stage 0 denotes the decision whether or not to take at 
least one observation. 



needed for computing the maximum expected loss associated with taking one more observation at 
stage (£-1) of sampling. From Bayes' theorem, it follows that: 



For the binomial distribution as the psychometric model involved and assuming the beta 
distribution B(oc, ft) as prior with parameters or and /?(«;/?> 0), it is known (e.g., Keats & Lord, 
1962) that the unconditional distribution of (X|,... Xk) is equal to: 



where T is the usual gamma function. From (8)-(9) it then follows that the posterior predictive 
distribution of X k , given a response pattern (xi,...,x*.|), can be written as: 

p(x k = x k | = [r(flH-^)ro^^*)r(aH-^-i)]/[r(cw-%,)rofff*-i-^.i)r(af^)]. (io) 

Using the well-known identity F(/-t-l) = jT(j) and the fact that s k = s k .\ and s k = s*-i+l for 
x k = 0 and 1, respectively, it follows from (10) that: 



Computation of Posterior Predictive Probabilities 



As can be seen from (5) and (7), the posterior predictive distribution P(X k = x k \ x\,...,x k .\) is 



P(X k =x k | xi,...,x*.i) = P(X\ = x\,...Ji k = x k )/P(X\ -x\,...,X k .{ -x*_i) 



( 8 ) 



p(Xx =x u ...Xk=x k ) = [r( a+fina+sm/i+k-smn o^n/^n a+p+k)) , 



(9) 




if xj c = 0 
if xk = 1 . 



( 11 ) 
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Illustration of Computing the Appropriate Action 

To illustrate the computation of the appropriate action (i.e., mastery, nonmastery, 
continuation) outlined above, suppose that the maximum test length is 25 (i.e., n = 25). First, the 
appropriate classification decision (i.e., declare mastery or nonmastery) and its associated 
maximum expected loss at the final stage of testing, d 25 (x\,...,x 25 ) and V^xi ,. • ■ ,* 25 ), are then 
computed for all possible values of s 25 (i.e., s 2 5 = 0,...,25). More specifically, mastery is declared 
for those values of s 2 5 for which the inequality in (4) holds, while nonmastery is declared otherwise. 

Likewise, the appropriate classification decision and its associated maximum expected loss 
are computed after 24 items have been administered (i.e., d 2 px\,.. .,* 24 ) and Vu(x\,...,x 2 4 )) for s 24 = 
0,...,24. Next, the maximum expected loss associated with administering one more random item 
after 24 items have been administered, £[/? 25 (xi,...,* 24 , X 2 5 1 *i,...,;t 24 ], is computed using (5) and 
(11) for 524 = 0,...,24. Another random item is administered if these values are smaller than 
V24(*i,...»*24), anc * otherwise classification decision d 2 ^(x i,...,x 24 ) is taken. 

For computing the appropriate action after 23 items have been administered, in addition to 
computing d 23 (xi,...,* 23 ) and V 23 (x lt ...jc 2 3 ) for 5 23 = 0,...,23, the risk R 2 4 (xi,...,x 24 ) at stage 24 of 
testing is computed using (6) for 5 24 = 0,...,24. The maximum expected loss associated with 
administering one more random item after 23 items have been administered, E[R 2 4(x i,...,x 23 , X 2 4 I 
*i,...,* 2 3 ], can then be computed as the expected risk using (7) and (11) for 5 23 = 0....23. One more 
random item is now administered if these values are smaller than V 2 3 (jci,. • . ,jc 23 ); otherwise, 
classification decision <i 23 (x i,...,x 23 ) is taken. Similarly, the appropriate action is determined at 
stage 22 until stage 0 of testing. 



Determination of the Least Favorable Prior 

To be able to compute the posterior predictive distribution P(X k = x k | jci ,...,jc*_i), the form of 
the assumed beta prior B{a, p) must be specified more specifically, that is, the numerical values of 
its parameters a and p (a, p > 0) must be determined. In the present paper the least favorable prior 
will be taken for B(a, P), as will be shown in this section, which results if for p the value 1 is taken 
and if a is taken sufficiently small. It should be noted, however, that other forms of the beta prior 
(e.g., the uniform prior with a = p = 1) might also be considered in computing the posterior 
predictive distribution. 
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Let / p{r,s) denote the incomplete beta function with parameters r and s (r, s > 0). It has 
been known for some time that 



n 

z 

x=m 






\ x J 



p x (\-p) n x =1 p (m,n-m + 1). 



Hence, the inequality in (4) can be written as: 



( 12 ) 



I tc {s k ,k-s k +i) < W+R). ( 13 ) 

Within the framework of Bayesian decision theory, given a response pattern (*i,. ..,**)> it can 
easily be verified from Table 1 that mastery is declared for the fixed-length mastery problem if 
number-correct score s k is such that 

(lio+ke)P(T < t c | s k ) + (ke)P(T > t c | s k ) < ( ke)P{T < t c | s k ) + ( l 0 i+ke)P(T > t c | s k ), ( 14) 

and that nonmastery is declared otherwise. Rearranging terms, it can easily be verified from (14) 
that mastery is declared if 

P(T<r t .k) < 1/(1+/?), (15) 

and that nonmastery is declared otherwise. 

Assuming a beta prior, it follows from an application of Bayes' theorem that under the 
assumed binomial model from (1), the posterior distribution of T will be a member of the beta 
family again (the conjugacy property, see, e.g., Lehmann, 1959). In fact, if the beta function 
B{a, p) with parameters or and ft {a, ft > 0) is chosen as the prior distribution and student's observed 
number-correct score is s k from a test of length k, then the posterior distribution of T is 
I t (a + sk,k-sk +fi). 



3 
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Hence, assuming a beta prior, it follows from (15) that mastery is declared if: 

^ c (ar+s£, k—sfc+p) < 1/(1+/?), (16) 

N 

and that nonmastery is declared otherwise. 

Thus, comparing (13) and (16) with each other, it can be seen that the least favorable prior 
for the minimax solution is given by a beta prior B(a, ft) with j3 - 1 and a sufficiently small. It 
should be noted that the parameter a > 0 can not be chosen equal to zero, because otherwise the 
prior distribution for T should be improper; that is, the prior does not integrate to 1 but to infinity. 



Simulation of Different Strategies for Variable-Length Mastery Testing 

\ 

In a Monte Carlo simulation the minimax sequential strategy will be compared with other 
existing approaches to both sequential and adaptive mastery testing. More specifically, four 
variable-length mastery testing strategies described in detail in Kingsbury and Weiss (1983) (see 
also, Weiss & Kingsbury, 1984) will be used here as a comparison in terms of average test length 
(i.e., the number of items that must be administered on the average before a mastery/nonmastery 
decision is made), correspondence between the simulated students' true mastery status and his/her 
estimated mastery status as indexed by the Loevinger’s coefficient H, and coefficient if as a 
function of average test length. 

Description of the testing strateeies used for comparison 

The first comparison will be made with a conventional fixed-length test (CT) in which student 
performance was recorded as proportion of correct answers (CT/PC). The student was declared a 
master for answering 60% or more items correctly after completion of the test, whereas nonmastery 
was declared otherwise. 

In order to determine whether the scoring method possibly accounts for differences between 
a Bayesian-scored AMT algorithm and the CT/PC procedure, the second comparison will be made 
with a conventional test where item responses were converted by Owen's Bayesian scoring 
procedure (CT/B) to a latent ability on an ERT-metric, assuming a standard normal prior N(0, 1). 
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Mastery was declared if the final posterior estimate of student's latent ability was higher 
than the prespecified cut-off point on the latent IRT-metric corresponding to 60% correct; 
otherwise nonmastery was declared. The cut-off point on the latent IRT-metric was hereby 
determined by transforming the proportion-correct of 0.6 through the use of the test response 
function (TRF), that is, the mean of the item response functions for all items in the pool. 

The third comparison will be made with Wald's SPRT procedure. The limits of the 
indifference region in which sampling will continue were set at proportion-correct values po and p\ 
of 0.5 and 0.7, respectively, whereas values of Type I and Type II error rates (i.e., a and 0) were 
each set equal to 0. 1 . According to the SPRT procedure, after k items have been administered with 
Sk of them being answered correctly, mastery was now declared if the likelihood ratio 

Ux , Xk\p\) / L(x u ...Xk I Po) = [(0.7) J * (0.3)*“** /(0.5) J * (0.5)* -J * ] was smaller than od{\-0), 

nonmastery if this likelihood ratio was larger than {\-ot)!0, and otherwise sampling was continued. 
For those students who could not be classified as either a master or nonmaster before the item pool 
was exhausted, a classification decision was made in the same way as in the CT/PC procedure, 
using a mastery proportion-correct value of 0.6. 

The fourth comparison will be made with an AMT strategy using a maximum information 
item selection strategy with a symmetric Bayesian confidence interval of 90% and using Owen's 
Bayesian scoring algorithm for a point estimation of student's latent ability on an IRT-metric. Like 
in the CT/B procedure, a standard normal prior 7V(0, 1 ) was assumed for the Bayesian scoring of the 
adaptive test. Also, like in the CT/B procedure, the prespecified cut-off points on the latent CRT- 
metric (i.e., the mastery levels) in each of the 100-item pools corresponding to 60% correct were 
determined from the TRF. 

In order to make a fair comparison of the minimax sequential strategy with the four 
strategies described above, the criterion level t c was set equal to 0.6. Furthermore, the losses /oi and 
l\o associated with the incorrect classification decisions were assumed to be equal corresponding to 
the assumption of equal error rates in Wald's SPRT procedure. On a scale in which one unit 
corresponded to the cost of administering one item (i.e., e = 1), /oi and /i 0 were each set equal to 
200 reflecting the fact that costs for administering another random item were assumed to be rather 
small relative to the costs associated with incorrect classification decisions. Finally, the parameter 
or of the beta distribution B(a,\) as least favorable prior was set equal to 10 9 . 
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Using the backward induction computational scheme discussed earlier, for given maximum 
test length n, a computer program called MINIMAX was developed to determine the appropriate 
action (i.e., nonmastery, mastery, continuing sampling) for the minimax sequential strategy at each 

(k+ 

stage of sampling k for different number-correct score Sk- The recurrent relation = 



u+u 







f k > 








<yj 




<y+ b 



in combination with 



V 










,0; 



1 , was hereby used for computing the binomial 



coefficients in (4). A copy of the program MINIMAX is available from the author upon request. 



Item pools 

In the simulation study by Kingsbury and Weiss (1983), the simulations were conducted using four 
100-item pools generated to reflect different types of item pools. 

Pool 1 (uniform pool) consisted of items that were perfect replications of each other. More 
specifically, each item had discrimination a of 1, difficulty 6 of 0, and lower asymptote c (pseudo- 
guessing level) of 0.2. This item pool reflected the SPRT procedure's assumption that all items have 
equal difficulty. As noted before, this assumption also reflects the choice of the binomial 
distribution for modeling response behavior in the minimax sequential procedure. 

Pool 2 (6-variable pool) varied from the uniform pool only in that the difficulties b differed 
across a range of values and reflected the 1 -parameter IRT model (i.e., Rasch model). 

Pool 3 (a- and 6-variable pool) varied from the 6-variable pool only in that the 
discriminations a differed across a range of values and was designed to simulate the 2-parameter 
IRT model. 

Pool 4 (a-, 6-, and c-variable pool) varied from the a- and 6-variable pool only in that the 
lower asymptotes c were allowed to spread across a range of values and simulated the 3-parameter 
IRT model. 

For a more detailed description of the four different item pools, refer to Kingsbury and 




Q 

C. 
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Weiss (1983). 



Applying the Minimax Principle to Sequential Mastery Testing- 22 



Maximum test lengths 

Conventional tests (CTs) of three different lengths (10, 25, and 50 items) were randomly drawn 
from each of the four item pools. Doing so, the 10-item test served as the first portion of the 25- 
item test and the 25-item test in turn served as the first portion of the 50-item test. These 12 CTs 
served as subpools from which the SPRT, AMT, and minimax sequential procedures drew items 
during the simulations. 

It is important to notice that this random sampling from a larger domain of items implies 
that the binomial model assumed in both Wald's SPRT and the minimax sequential procedure 
holds. Thus, not only for the uniform pool but also for the ^-variable, a- and ^-variable, and a-, b~, 
and c-variable pool, the assumed binomial model holds in these two testing strategies. 

Item response generation 

Item responses for 500 simulated students, drawn from a /V(0,1) distribution, were generated for 
each item in each of the four item pools. For known ability of the simulated student and given item 
parameters, first the probability of a correct answer was calculated using the 3-PL model. Next, this 
probability was compared with a random number drawn from a uniform distribution in the range 
from 0 to 1 . The item administered to the simulated student was scored correct and incorrect if this 
randomly selected number was less and greater than the probability of a correct answer, 
respectively. 

Furthermore, a simulated student was supposed to be a "true" master if his/her ability used 
to generate the item responses was higher than a prespecified cut-off point on the /V(0,1) ability 
metric. Since a value of 0.6 on the proportion-correct metric of each of the four item pools 
corresponded after conversion with a value of 0 on the N( 0,1) ability metric, the cut-off point on the 
N( 0,1) ability metric was set equal to 0. 
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Results of the Monte Carlo Simulation 

In this section, the results of the Monte Carlo simulations will be compared for the different 
variable-length mastery testing strategies in terms of average test length, correspondence with true 
mastery status (i.e., classification accuracy), and correspondence as a function of average test 
length (i.e., efficiency of testing strategy). 

Average test lengths 

Table 2 shows the average number of items required by each of the variable-length mastery testing 
strategies before a mastery/nonmastery decision can be made. The minimax sequential testing 
strategy is hereby denoted as MINI. 

Insert Table 2 about here 

As can be seen from Table 2, the MINI strategy resulted in considerably average test length 
reductions for each combination of item pool and maximum test length (MTL). Table 2 also shows 
that, except for the a-, b-, and c-variable pool by the SPRT strategy at the 50-item MTL level, the 
MINI procedure resulted in a greater reduction of average test lengths than the conventional, AMT, 
and SPRT strategies for each item pool at all MTL levels. Finally, like under the other strategies, it 
can be inferred from Table 2 that for each item pool the reduction in average test length increased 
under the MINI strategy with increasing MTL. For the uniform pool, the average test length was 
reduced by 36%, 54%, and 71% for the 10-item MTL, 25-item MTL, and 50-item MTL, 
respectively. For the ^-variable pool, a- and ^-variable pool, and a-, b -, and c-variable pool, these 
percentages in average test length reduction were (25%; 44%; 61%), (41%; 57%; 68%), and (28%; 
50%; 65%), respectively. Hence, under the MINI strategy, the greatest reductions in average test 
length were achieved by the a- and ^-variable pool and uniform pool. 

Classification accuracy 

Kingsbury and Weiss (1983) and Weiss and Kingsbury (1984) used phi correlations between true 
classification status (i.e., true master or true nonmaster) and estimated classification status (i.e., 
declaring mastery or nonmastery) as indicators of the quality/validity of the classification decisions. 

r 

Therefore, these authors denoted the phi correlations as measures of classification accuracy. 
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However, the phi coefficient is not appropriate for the assessment of classification accuracy. 
The reason is that phi is sensitive to unequal proportions of true and declared masters (see, Lord & 
Novick, 1968, sect. 15.9). Van der Linden and Mellenbergh (1978) proposed coefficient delta for 
the assessment of classification accuracy, which is not sensitive to unequal proportions of true and 
declared masters. They showed that delta reduces to the well-known Loevinger’s coefficient H if 
the threshold loss function is: loo = hi =0, / 0 i = ho- Since the losses for the correct classification 
decisions were assumed to be equal to zero and the losses for the incorrect classification decisions 
were both set equal to 200, it follows thus that coefficient H applies to our simulation study. 
Coefficient H is defined as phi/phi(max), where phi(max) is the maximum of phi given the 
marginal distributions of the 2x2 table. Although coefficient delta is not always in the interval 
from 0 to 1, however, it has been shown (van der Linden & Mellenbergh, 1978) that coefficient H 
is in this interval. A value of 0 signifies that the test is worthless, and a value of 1 signifies that the 
test is perfect for the decision situation. 
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Insert Table 3 about here 

Table 3 shows Loevinger’s coefficients H for the present simulation study. As can be seen 
from Table 3, the MINI strategy resulted only for the a-, 6 -, and c-variable pool in higher 
coefficients H than the other four testing strategies at all MTL levels. In particular, for the 10-item 
MTL the coefficients H were considerably higher. For both the invariable and a- and 6-variable 
pool, the other four testing strategies generally yielded somewhat higher coefficients H. For the 
uniform pool, however, the other four testing strategies yielded considerably higher coefficients H. 

Furthermore, Table 3 shows that the coefficients H for both the 25-item and 50-item MTL 
were higher than for the 10-item MTL by each pool type under the MINI strategy. For both the a - 
and 6-variable pool and a-, 6-, and c-variable pool, under the MINI strategy, the 50-item MTL 
yielded higher coefficients H than the 25-item MTL, whereas the opposite did hold for both the 
uniform and 6-variable pool. 

Most efficient testing strategy 

Kingsbury and Weiss (1983) depicted graphically the phi correlation as a function of the average 
number of items administered by each testing strategy for each item pool (see also Weiss and 
Kingsbury, 1984). In other words, they matched the average test length on the classification 
accuracy. From these graphs conclusions were derived concerning which testing strategy was most 
efficient. A testing strategy was hereby said to be most efficient if it results in the combination of 
highest phi correlation and shortest average test length. Following Kingsbury and Weiss (1983) and 
Weiss and Kingsbury (1984), a testing strategy will be called most efficient in this paper if it results 
in the combination of highest Loevinger’s coefficient H and shortest average test length. 

As is immediately clear from Tables 2 and 3, the MINI strategy was the most efficient of all 
testing procedures for the (realistic) a-, 6-, and c-variable pool, since it generally yielded both the 
highest coefficients H and shortest average test lengths at each MTL level. Although the SPRT 
strategy required at the 50-item MTL level, on the average, somewhat fewer items for reaching a 
mastery/nonmastery decision than the MINI strategy (i.e., 15.70 versus 17.27), however, the 
coefficient H for the SPRT strategy was much lower compared to the MINI strategy (i.e., 0.678 
versus 0.948). For an average test length of 15.70 (interpolating from the data in Tables 2 and 3), 
the MINI strategy would result in a coefficient H of 0.804. 
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For. the a- and 6-variable pool, as can been from Tables 2 and 3, the MINI strategy yielded 
shorter mean test lengths than all other strategies, whereas the coefficients H were generally 
somewhat lower at each MTL level. The MINI strategy resulted in a coefficient H of 0.792 at a 
mean test length of 15.96 (the longest mean test length observed at the 50-item MTL level). 
Interpolating data from Tables 2 and 3, it can easily be verified that the SPRT procedure would 
need to administer 16.47 items to achieve this same coefficient H of 0.792, the AMT procedure 
would need 15.07 items, the CT/B procedure would need 24.63 items, and the CT/PC procedure 
would need 21.48 items. Hence, for the a- and 6-variable pool, the MINI procedure was 
considerably more efficient than both the CT/PC and CT/B strategies, whereas the MINI procedure 
was somewhat more efficient than the SPRT procedure. Compared to the AMT procedure, 
however, the MINI procedure was somewhat less efficient. 

For the 6-variable pool, Tables 2 and 3 show that at the longest mean test length observed 
for the MINI procedure (i.e., 19.48 at the 50-item MTL level), this strategy resulted in a coefficient 
H of 0.689. Interpolating data from Tables 2 and 3, it follows that the SPRT procedure would need 
to administer 16.08 items to achieve this same coefficient H of 0.689, the AMT procedure would 
need 12.89 items, the CT/B procedure would need 23.95 items, and the CT/PC procedure would 
need 20.42 items. Hence, for the 6-variable pool, it can be concluded that the MINI procedure was 
considerably more efficient than the CT/B procedure and somewhat more efficient than the CT/PC 
procedure. On the other hand, however, the MINI procedure was somewhat less efficient than the 
SPRT procedure and considerably less efficient than the AMT procedure. 

Finally, it can be inferred from Tables 2 and 3 that the MINI strategy resulted for the 
uniform pool in a coefficient H of 0.721 at the longest mean test length observed (i.e., 14.49 at the 
50-item MTL level). It follows immediately from Tables 2 and 3 that each of the four other testing 
strategies would need to administer less than 10 items to achieve this same coefficient H of 0.721. 
Hence, for the (unrealistic) uniform pool, it can be concluded that the MINI procedure is 
considerably less efficient than the four other testing strategies. 
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Discussion 

Optimal rules for the sequential mastery problem (nonmastery, mastery, and to continue 
sampling) were derived using the framework of minimax sequential decision theory. The binomial 
distribution was assumed for modeling response behavior, whereas threshold loss was adopted for 
the loss function involved. The least favorable prior, used in the present paper for computing the 
posterior predictive distributions, turned out to be the beta distribution with parameter /? equal to 1 
and parameter or sufficiently small. 

In a Monte Carlo simulation, the minimax sequential procedure (MINI) was compared with 
other procedures that exist for both sequential and adaptive mastery testing in the literature. 
Maximum test length (MTL) varied from 10 to 50 items, and different types of item pools were 
considered by changing the values of the item parameters. 

The results of the simulation study indicated that, compared to the other testing strategies 
examined in the literature, the MINI strategy was most efficient (i.e., combination of highest 
Loevinger’s coefficient H between true and estimated mastery status and shortest average test 
length) for item pools reflecting the (realistic) 3 PL-model at each MTL level. Also, except for the 
AMT strategy, the MINI strategy turned out to be most efficient for item pools reflecting the 2 PL- 
model at each MTL level. For item pools reflecting the 1 PL-model (i.e., the Rasch model), the 
MINI strategy appeared to be more efficient than the two conventional fixed-length methods (i.e., 
employing proportion correct and a Bayesian scoring method for making mastery/nonmastery 
decisions) but less efficient than both the AMT and SPRT procedure at each MTL level. For the 
(unrealistic) uniform item pools, however, it turned out that the MINI strategy was less efficient 
than the other testing strategies at each MTL level. 

It is important to notice, however, that the MINI strategy is especially appropriate when 
costs of testing can be assumed to be quite large. For instance, when testlets rather than single items 
are considered. Also, the MINI strategy might be appropriate in psychodiagnostics. Suppose that a 
new treatment (e.g., cognitive-analytic therapy) must be tested on patients suffering from some 
mental health problem (e.g., anorexia nervosa). Each time after having exposed a patient to the new 
treatment, it is desired to make a decision concerning the effectiveness/ineffectiveness of the new 
treatment or testing another patient. In such clinical situations, costs of testing generally are quite 
large and the MINI approach might be considered as an alternative to other testing strategies, such 
as SPRT, AMT, or conventional fixed-length tests. 
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An issue that still deserves some attention is why in the present paper, somewhat counter to 
the current trend in applied measurement, a random rather than ERT-based adaptive item selection 
procedure is preferred. As noted before, ERT-based item selection strategies assume that a 
calibrated pool of items exists which differ in their particular characteristics (i.e., levels of difficulty 
and discrimination). For random item selection strategies, such as Wald's SPRT procedure and the 
minimax sequential procedure advocated in this paper, however, the existence of a pool of parallel 
items only is required. Such pools of parallel items often are easier to construct than pools of items, 
which do differ in their IRT characteristics. 

In case a calibrated pool of items does exist, however, an IRT-based adaptive strategy that 
selects items for administration based on their particular characteristics is preferred rather than to 
randomly select items from a pool. A promising approach, in which the strong point of the minimax 
and Bayesian sequential procedures, that is, taking cost per observation explicitly into account, is 
combined with an IRT-based adaptive item selection strategy might be the following. The item to 
be administered next is the one that maximizes information or minimizes posterior variance at 
student’s last ability estimate on an IRT-metric. At each stage of sampling, the action declaring 
mastery, declaring nonmastery, or to continue sampling is then chosen which minimizes the 
posterior or maximum expected losses associated with all possible decision rules (see also Vos & 
Glas, 2000). 

A final note is appropriate. Following the same line of reasoning as in the present paper, 
optimal rules derived here can easily be generalized to the situation where three or more mutually 
exclusive classification categories can be distinguished. In Weiss and Kingsbury (1984), it is 
indicated how the AMT procedure can be employed in the context of allocating students to more 
than two grade classes (i.e., adaptive grading test). Spray (1993) has shown how a generalization of 
Wald’s SPRT procedure (i.e., Armitage’s (1950) combination procedure) can be applied to multiple 
categories, whereas Bayesian sequential decision theory is applied in Vos (1999) to SMT in case 
the three classification decisions declaring nonmastery, partial mastery, and mastery are open to the 
decision-maker (see also Smith & Lewis, 1995). 
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Table 1 

Table f or threshold loss function at stage k ( 1 < k< n) of sampling 



True Level 
of Functioning 


T<t c 


T> t c 


Action 






a 0 (xi, .... x k ) 


ke 


/oi + ke 


ai(*i, .... x k ) 


l\o + ke 


ke 
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Table 2 



Mean Number of Items Administered to Each Simulee for Four Mastery Testing Strategies Using Each Item 
Pool, at Three Maximum Test Lengths 



Item pool and 
testing strategy 


10 


Maximum test length 
25 


50 


Uniform pool 


Conventional 


10.00 


25.00 


50.00 


AMT 


9.03 


15.99 


23.00 


SPRT 


8.75 


13.12 


15.39 


MINI 


6.41 


11.47 


14.49 


6-variable pool 


Conventional 


10.00 


25.00 


50.00 


AMT 


9.43 


18.09 


27.17 


SPRT 


9.62 


16.79 


21.41 


MINI 


7.55 


14.08 


19.48 


a v and 6-variable pool 


Conventional 


10.00 


25.00 


50.00 


AMT 


8.55 


15.78 . 


24.07 


SPRT 


9.41 


15.78 


18.55 


MINI 


5.86 


10.86 


15.96 


a 6-, and c-variable pool 


Conventional 


10.00 


25.00 


50.00 


AMT 


8.73 


16.35 


23.39 


SPRT 


8.62 


13.42 


15.70 


MINI 


7.18 


12.61 


17.27 





O 

o 
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Table 3 

Loevinger’s Coefficients H between Observed Mastery Status and True Mastery Status for Each Mastery 
Testing S trategy, Using Each Type of Item Pool, at Three Maximum Test Lengths 



Item pool and 
testing strategy 


10 


Maximum test length 
25 


50 


Uniform pool 


CT/PC 


0.862 


0.901 


0.923 


CT/B 


0.813 


0.887 


0.919 


AMT 


0.873 


0.912 


0.920 


SPRT 


0.856 


0.908 


0.916 


MINI 


0.612 


0.824 


0.721 


^-variable pool 


CT/PC 


0.614 


0.722 


0.837 


CT/B 


0.609 


0.691 


0.848 


AMT 


0.649 


0.749 


0.879 


SPRT 


0.607 


0.698 


0.788 


MINI 


0.578 


0.709 


0.689 


a- and b - variable pool 


CT/PC 


0.691 


0.801 


0.832 


CT/B 


0.696 


0.799 


0.834 


AMT 


0.691 


0.803 


0.829 


SPRT 


0.683 


0.789 


0.801 


MINI 


0.671 


0.765 


0.792 


a b- y and c - variable pool 
CT/PC 


0.389 


0.781 


0.841 


CT/B 


0.413 


0.817 


0.889 


AMT 


0.408 


0.809 


0.878 


SPRT 


0.378 


0.689 


0.678 


MINI 


0.748 


0.892 


0.948 
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